
Appl. No. 09/421,416 



In the Claims: 

This section sets forth a clean version of the entire set of pending claim(s) under 37 
C.F.R. 1.121(c)(3). Appendix A submitted herewith sets forth a marked up version of 
the prior pending claim(s) which have been amended by this Amendment with additions 
shown with underlining (e.g. new text) and deletions shown with a strikethrough (e.g. 
do i oto toxt ) under 37 C.F.R. 1.121(b)(1)(iii). 

This amendment amends claims 1, 22, 39, 41-43, 45, 47 and 49, and cancels 
claim 44, without prejudice. 



1. A method for quantitatively rebresenting documents in a vector space, 
comprising the steps of: / 

identifying a first document to be/processed from a plurality of documents; 

extracting a first feature corresoonding to the first document from the plurality of 
documents, the first feature comprising text surrounding an image included in the 
document, the text surrounding the |mage not being anchor text; 

converting the first feature to a first vector; and 

associating the first vector with the first document. 

7, The method of claim 1 further comprising the steps of: 

extracting a second feature corresponding to the document, the second feature 
comprising a first URL representing the first document; 

converting the second feature to a second vector; and 
associating the second vector with the first document. 

8. The method of/claim 7, wherein the step of converting the second feature 
comprises the sub-ste/ps of: 

identifying each unique word within the URLs representing all documents in the 
collection of documents; and 

/ ^ 
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counting the occurrences of each unique word in tne first URL; 

creating a vector having a number of dimensions equal to the number of unique 
words in the URLs representing all documents in tne collection of documents, and 
further having as each element a numeric valued representative of the number of 
occurrences in the first URL of the corresponding word. 

9. The method of claim 8, wherein the value representative of the number of 
occurrences in the first URL of the corresponding word is calculated as the token 
frequency weight of the corresponding word multiplied by the inverse context frequency 
weight of the corresponding word. / 

10. The method of claim 1 further comprising the steps of: 

extracting a second feature cornesponding to the first document, the second 
feature comprising inlinks in the collection of documents linking to the first document; 
converting the second feature to a second vector; and 
associating the second vector with the first document . 

11. The method of claim 10, wherein the step of converting the second feature 
comprises the sub-steps of: / 

identifying each document having links within the collection of documents; 
determining how many/times each document having links points to the first 
document; and / 

creating the second vector having a number of dimensions equal to the number of 
documents having links in trie collection of documents, and the second vector further 
having as each element d numeric value representative of the number of links in each 
corresponding document linking to the first document. 

12. The method ofyclaim 1 1 , wherein the numeric value representative of the number 
of links in each corresponding document linking to the first document is calculated as 

/ ^ 
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the token frequency weight of the corresponding link multip/ed by the inverse context 
frequency weight of the corresponding link. 

13. The method of claim 10, wherein the step of /converting the second feature 
comprises the sub-steps of: 

identifying each document having hyperlinks/vithin the collection of documents, 
and further identifying each unique word associapd with URLs defining hyperlinks in 
each document; 

counting the occurrences of each uniqi^ word in the URLs defining hyperlinks 
pointing to the first document; and 

creating the second vector having a i?iumber of dimensions equal to the number 
of unique words associated with URLs defining hyperlinks within the collection of 
documents, and the second vector further having as each element a numeric value 
representative of the number of occurrences in the URLs defining hyperlinks pointing to 
the first document of the corresponding word. 

14. The method of claim 13, wherein the numeric value representative of the number 
of occurrences in the URLs defining hyperlinks pointing to the first document of the 
corresponding word is calculated as the token frequency weight of the corresponding 
word multiplied by the inverse pontext frequency weight of the corresponding word. 

1 5. The method of claim/I further comprising the steps of: 
extracting a second feature corresponding to the first document, the second 

feature comprising outlinKS in the collection of documents linking to the first document; 
converting the second feature to a second vector; and 
associating the second vector with the first document . 



16. The methf^d of claim 15, wherein the step of converting the second feature 
comprises the stib-steps of: 
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identifying each other document linked to by all doc/nnents within the collection 
of documents; and 

creating the second vector having a number of cKmensions equal to the number 
of other documents linked to by documents in the jCollection of documents, and the 
second vector further having as each element a numeric value representative of the 
number of links in the first document linking to each corresponding other document. 

17, The method of claim 16, wherein the numeric value representative of the number 
of links in the first document linking to each corresponding other document is calculated 
as the token frequency weight of the corresponding link multiplied by the inverse 
context frequency weight of the corresponding link. 

18. The method of claim 15, wher^n the step of converting the second feature 
comprises the sub-steps of: 

identifying each unique word y^ssociated with URLs defining hyperlinks in each 
document in the collection of docurnents; 

counting the occurrences o/ each unique word in the URLs defining hyperlinks in 
the first document; and 

creating the second vector having a number of dimensions equal to the number 
of unique words associated with the URLs defining hyperlinks in each document, and 
the second vector further having as each element a numeric value representative of the 
number of occurrences iiy the URLs defining hyperlinks in the first document of the 
corresponding word. 



19. The method ofy6laim 18, wherein the numeric value representative of the number 
of occurrences in /the URLs defining hyperlinks in the first document of the 
corresponding word is calculated as the token frequency weight of the corresponding 
word multiplied by the inverse context frequency weight of the corresponding word. 
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20. The method of claim 49, wherein the second feature compris^ a text genre 
feature. 

21. The method of claim 20, wherein the step of converting the second feature 
connprises the sub-steps of: 

for each possible text genre, processing the first d6cument to calculate the 
probability that the first document is of the corresponding XeA genre; and 

creating the second vector having a number of dim^snsions equal to the number 
of possible text genres, and the second vector further having as each element a 
numeric value representative of the probability that th^^irst document is of the 
corresponding genre. 

22. The method of claim 49, wherein the firs/feature comprises the color histogram 
for the image included in the first document. 

23. The method of claim 22, wherei/i the step of converting the first feature 
comprises the sub-steps of: 

quantizing the image represented by the first document into a multi-dimensional 
color model; / 

creating a color histogram having a plurality of bins for each dimension in the 
color model, each bin corresponding to a unique combination of binary bits representing 
information from the associated/dimension of the color model; 

counting each of a olurality of pixels from the image in a corresponding bin 
associated with each dimension of the color model; and 

creating the first vector having a number of dimensions equal to the total number 
of bins in the color hi;^ogram, and the first vector further having as each element a 
numeric value repre;^ntative of the number of pixels in the image corresponding to the 
corresponding histogram bin. 
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24. The method of claim 23, wherein the plurality of pixels from the image/fn the 
counting step comprises all of the pixels in the image. / 

25. The method of claim 24, wherein the plurality of pixels from ther image in the 
counting step comprises an approximately uniformly spaced set of si/Dsampled pixels 
from the image. . / 

26. The method of claim 23, wherein: / 

the color model comprises a three-dimensional hue, safturation, and value color 

model; / 

each dimension of the color model is represented by two bits of information; and 
the color histogram has four bins for each dimension in the color model, for a 

total of twelve bins. / 

27. The method of claim 23, wherein the image represented by the first document 
comprises a region of a bitmap. / 

28. The method of claim 49, wherein me first feature comprises the color complexity 
of an image represented by the first document. 

29. The method of claim 28/wherein the step of converting the first feature 
comprises the sub-steps of: / 

quantizing the image r^resented by the first document into a multi-dimensional 
color model; / 

detemnining the maximum number of pixels in any row in any image represented 
by any document in thexollection of documents; 

determining me maximum number of pixels in any column in any image 
represented by anyaocument in the collection of documents; 
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creating a horizontal complexity histogram and a vertical complexity higfogram, 
each having a number of bins equal to the maximum number of pixels in anyrow and in 
any column, respectively; 

identifying horizontal runs of pixels of all possible lengths in the>4uantized image, 
and for each possible length, counting the number of pixels in a plurality of rows of the 
quantized image belonging to the horizontal runs in a corr^ponding bin of the 
horizontal complexity histogram; 

identifying vertical runs of pixels of all possible lengtKs in the quantized image, 
and for each possible length, counting the number of pixels in a plurality of columns of 
the quantized image belonging to the vertical runs /i a corresponding bin of the 
horizontal complexity histogram; 

creating a horizontal complexity vector havipTg a number of dimensions equal to 
the maximum number of pixels in any row, ajnd further having as each element a 
numeric value representing the number of pixels in the image in the corresponding 
horizontal histogram bin; and 

creating a vertical complexity vectpf having a number of dimensions equal to the 
maximum number of pixels in any OE^umn, and further having as each element a 
numeric value representing the nupiber of pixels in the image in the corresponding 
vertical histogram bin. 

30. The method of claim 2^, wherein the plurality of rows comprises all rows of the 
quantized image, and wh^ein the plurality of columns comprises all columns of the 
quantized image. 

31. The method^ of claim 29, wherein the plurality of rows comprises an 
approximately uniformly spaced set of subsampled rows from the image, and wherein 
the plurality of columns comprises an approximately uniformly spaced set of 
subsampled columns from the image. 
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, wherein: 



32. The method of claim 29 

the color model comprises k three-dimensional hue, saturation, and value color 
model; and 



each dimension of the coloi 



33. The method of claim 29 
horizontal complexity vector and 
vector having a number of dimensions 
row plus the maximum number o1 



model is represented by two bits of infomiation. 



further comprising the step of concatenating the 
the vertical complexity vector to form a complexity 
equal to the maximum number of pixels in any 
pixels in any column. 



34. The method of claim 2 i, wherein the step of converting the first feature 
comprises the sub-steps of: 

quantizing the image repiesented by the first document into a multi-dimensional 
color model; 

determining the maximum number of pixels in any row in any image represented 
by any document in the collection of documents; 

determining the maxinrjum number of pixels in any column in any image 
represented by any document in the collection of documents; 

creating a horizontal complexity histogram and a vertical complexity histogram, 
each having a selected number of bins corresponding to a plurality of quantized ranges 
of run lengths; 

identifying horizontal rJns of pixels of all possible lengths in the quantized image, 
and for each possible length] counting the number of pixels in a plurality of rows of the 
quantized image belonging/ to the horizontal runs in a corresponding bin of the 
horizontal complexity histogram; 

identifying vertical runs of pixels of all possible lengths in the quantized image, 
and for each possible length, counting the number of pixels in a plurality of columns of 
the quantized image belonging to the vertical runs in a corresponding bin of the 
horizontal complexity histogram; 
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creating a horizontal complexity vector having a number of dimensions equal to 
the selected number of bins in the horizontal complexity histogram, and Wmher having 
as each element a numeric value representing the number of pixels inylne image in the 
corresponding horizontal histogram bin; and / 

creating a vertical complexity vector having a number of dimensions equal to the 
number of bins in the vertical complexity histogram, and further having as each element 
a numeric value representing the number of pixels in the imaq/in the corresponding 
vertical histogram bin. / 

35. The method of claim 34, wherein: / 

a bin bx in the horizontal complexity histogranVcorresponding to a horizontal run 
of length rx is identified by a relationship = f loop{rx(A/-1 ) / {nJ4)) + 1, where N is the 
selected number of bins in the horizontal complexity histogram and nx is a maximum 
number of pixels in any row of an image in the collection; and 

a bin by in the vertical complexity histogram corresponding to a vertical run of 
length ry is identified by a relationship fa/= floor(ry(A/-1) / (^73/4)) + 1, where N is the 
selected number of bins in the horizontal complexity histogram and ny is a maximum 
number of pixels in any row of an image in the collection. 

36. The method of claim 34, wherein the plurality of rows comprises an 
approximately uniformly spaced/set of subsampled rows from the image, and wherein 
the plurality of columns CGmiprises an approximately uniformly spaced set of 
subsampled columns from Xme image. 

37. The method of c\a\m 34, wherein: 

the color moder comprises a three-dimensional hue, saturation, and value color 
model; and / 

each dimension of the color model is represented by two bits of information. 
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38. The method of claim 34, further comprising the step of concatenating the 
horizontal complexity vector and the vertical complexity vector to fofrn a complexity 
vector having a number of dimensions equal to the selected number of bins in the 
horizontal complexity histogram plus the selected number qI( bins in the vertical 
complexity histogram. 

39. A signal representing instructions for quantitatively representing in a vector 
space users of a collection of documents, the instructions comprising: 

identifying a first user to be processed from th^users of the collection of 
documents; 

extracting from the collection of documents a first feature representing a first 
sub-set of documents of the collection that have been accessed by the first user; 
converting the first feature to a first vector; and 
associating the first vector with the first user. 

41 . The signal of claim 39, whereir/the converting instruction comprises: 
identifying each unique document in the collection of documents; 
calculating the number of/times the first user accessed each document in the 

collection of documents; and 

creating the first vecto/having a number of dimensions equal to the number of 

documents in the collectioiVof documents, and the first vector further having as each 

element a numeric value /representative of the number of times the first user has 

accessed the correspo/iding document. 



42. The signal of claim 41 , wherein the value representative of the number of times 
the first user has accessed the corresponding document is calculated as the token 
frequency weigbit of the corresponding document multiplied by the inverse context 
frequency weight of the corresponding document. 
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43. A computer-readable medium containing instructions for causing af computer- 
system to quantitatively representing documents in a vector space, by the4teps of: 

identifying a document to be processed from a plurality of docurriBnts; 

selecting an image feature as a first feature, the image featur^ being associated 
with the non-text content of an image included in the document; / 

-extracting from the document information associated with the first feature; 

converting information associated with the first feature intoya first vector; 

associating the first vector with the document; / 

selecting a second feature from a set of multi-modal features including a user 
information feature and a genre feature; / 

extracting from the document information associated wm the second feature; 

converting the information associated with the second feature into a second 
vector; and / 

associating the second vector with the documem, 

45. The computer-readable medium of claini 43 wherein the first feature comprises 
a color histogram for the image included in the document. 

46. The computer-readable mediumr of claim 45 wherein converting the information 
associated with the first feature into the first vector comprises the steps of: 

quantizing the image included in the document into a multi-dimensional color 
model; / 

creating a color histogram having a plurality of bins for each dimension in the 
color model, each bin corresponding to a unique combination of binary bits representing 
information from the associated dimension of the color model; 

counting each or a plurality of pixels from the image in a corresponding bin 
associated with each/dimension of the color model; and 

creating a vector having a number of dimensions equal to the total number of 
bins in the color histogram, and further having as each element a numeric value 
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representative of the number of pixels in the image corresponding to the C0fresponding 
histogram bin. y 

47. The computer-readable medium of claim 43 wherein tb^irst /eature comprises 
color complexity of the image included in the document. / / 

48. The computer-readable medium of claim 47 wherein converting the information 
associated with the first feature into the first vectorxcomprises the steps of: 

quantizing the image included in the document into a multi-dimensional color 
model; / 

determining the maximum number of pixels in any row in any image represented 
by any document in the collection of documents; 

determining the maximum nupnber of pixels in any column in any image 
represented by any document in the £X)llection of documents; 

creating a horizontal compljexity histogram and a vertical complexity histogram, 
each having a number of bins eoual to the maximum number of pixels in any row and in 
any column, respectively; / 

identifying horizontal runs of pixels of all possible lengths in the quantized image, 
and for each possible lendih, counting the number of pixels in a plurality of rows of the 
quantized image belonging to the horizontal runs in a corresponding bin of the 
horizontal complexity l?listogram; 

identifying vertical runs of pixels of all possible lengths in the quantized image, 
and for each possible length, counting the number of pixels in a plurality of columns of 
the quantized rmage belonging to the vertical runs in a corresponding bin of the 
horizontal conralexity histogram; 

creating a horizontal complexity vector having a number of dimensions equal to 
the maxirnum number of pixels in any row, and further having as each element a 
numeric y^alue representing the number of pixels in the image in the corresponding 
horizontal histogram bin; and 
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creating a vertical complexity vector having a number of dimens|OTis equal to the 
maximum number of pixels in any column, and further having ats each element a 
numeric value representing the number of pixels in the image in the corresponding 
vertical histogram bin. / 

49. A method for quantitatively representing /locuments in a vector space, 
comprising the steps of: / 

identifying a first document to be processed from a plurality of documents; 

extracting a first feature corresponding to the first document from the plurality of 
documents, the first feature comprising/ an image feature associated with non-text 
content of an image included in the firsfaocument; 

converting the first feature to/a first vector; 

associating the first vectopwith the first document; 

extracting a second femure corresponding to the document, the second feature 
comprising a one of a usej/feature and a text genre feature; 
converting the second feature into a second vector; and 
associating the second vector with the first docugigDL- — ^ — — — " 



REMARKS 

The Office Action of August 19, 2002 has been carefully considered. 
Reconsideration of this application, as amended, is respectfully requested. Claims 1, 7- 
39, 41-43 and 45-49 are pending in this application. Of these, claims 1, 39, 43, and 49 
are independent. In this Amendment, claims 1 , 22, 39, 41-43, 45, 47 and 49 have been 
amended, and claim 44 has been cancelled, without prejudice. 
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